Minimizing regret in repeated play of games

نویسنده

  • Uriel Feige
چکیده

These notes are largely based on a survey on learning, regret minimization and equilibria written by Avrim Blum and Yishay Mansour, and appearing as Chapter 4 of [NRTV07]. The main difference are in the presentation rather than in the content, and in the inclusion of Section 1.5. The notes correspond to two 2-hour lectures in the course on algorithmic game theory given in the Weizmann Institute in May 2013. Suppose player P is faced with a game for which P knows his own payoff matrix, but does not know the payoff matrix of the other players (or equivalently, knows payoff matrices for the other players, but has no reason to assume that other players are rational). In this general setting, one may represent (essentially without loss of generality) all remaining players as one player Q, and hence the game may be assumed to be a 2-player game. How should P play? If P has a dominant strategy, then playing it is a natural policy. But in the absence of dominant strategies, solution concepts such as Nash equilibrium involve the payoffs of both players, and hence P would not know which strategy corresponds to such a solution concept. Moreover, it might be the case that Q is not willing to play some strategies available to Q (e.g., they might be dominated by other strategies), so it could be that the payoff matrix for P (say, as a row player) contains columns that are not really part of the game. At this full generality of this situation, it is difficult to provide good advice for P what to play. The content of these notes is a variation on the above situation, in which some basic game G (for which P knows only his own payoff matrix) is repeated for T rounds, and P wishes to maximizes the sum of expected payoffs from all rounds. In this situation, P does have hope of playing intelligently, because at any given round P knows what Q played in previous rounds, and P can use this information in deciding what to play in the given round.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Minimizing Disappointment in Repeated Games

We consider the problem of learning in repeated games against arbitrary associates. Specifically, we study the ability of expert algorithms to quickly learn effective strategies in repeated games, towards the ultimate goal of learning near-optimal behavior against any arbitrary associate within only a handful of interactions. Our contribution is three-fold. First, we advocate a new metric, call...

متن کامل

Minimizing regret : the general case . ¤ Aldo

In repeated games with di®erential information on one side, the labelling \general case" refers to games in which the action of the informed player is not known to the uninformed, who can only observe a signal which is the random outcome of his and his opponent's action. Here we consider the problem of minimizing regret (in the sense ̄rst formulated by Hannan [8]) when the information available...

متن کامل

Efficient Regret Minimization in Non-Convex Games

We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guar...

متن کامل

Adaptive Regret Minimization in Bounded-Memory Games

Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of ...

متن کامل

Robust approachability and regret minimization in games with partial monitoring

Approachability has become a standard tool in analyzing learning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficie...

متن کامل

4 Learning , Regret minimization , and Equilibria

Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings of this type, along with connections to game-theoretic equilibria when all players in a system are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013